首页> 外文OA文献 >Linked knowledge sources for topic classification of microposts:a semantic graph-based approach
【2h】

Linked knowledge sources for topic classification of microposts:a semantic graph-based approach

机译:微博主题分类的链接知识源:基于语义图的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures.
机译:事实证明,短消息也称为微博(例如Tweets)是揭示趋势和事件信息的有效渠道,从与灾难有关的信息(例如飓风桑迪)到与暴力相关的信息(例如埃及革命)。获悉此类事件的发生对于当局和应急专业人员而言,通过允许此类当事方立即做出响应,可能极为重要。在这项工作中,我们研究了微博的主题分类(TC)问题,该问题旨在根据其中讨论的主题对短信进行自动分类。然而,微博的准确TC是一项具有挑战性的任务,因为博文中令牌数量有限通常意味着缺乏足够的上下文信息。为了向Microposts提供上下文信息,我们提出并评估围绕链接知识源(KSs)中存在的概念的几种图形结构。传统的TC技术仅通过从Microposts内容中提取的功能来丰富Microposts的内容。相反,我们的方法依赖于从链接的KS中提取的不同加权语义元图的生成。我们介绍了一种新的语义图,称为类别元图。这张新颖的元图提供了一组更新颖的语义特征的更细粒度的概念分类。我们的发现表明,此类类别元图特征可有效提高Microposts主题分类器的性能。此外,我们的目标还在于了解哪个语义特征有助于主题分类器的性能。因此,我们提出了一种自动估计新的,未见过的微博上主题分类器准确性损失的方法。我们介绍和评估新颖的主题相似性度量,考虑到这些文档的丰富表示,它们可以在概念上捕获KS文档和Microposts之间的相似性。在紧急响应(ER)和暴力检测(VD)方面进行的广泛评估表明,我们的方法优于仅使用没有链接数据和Twitter数据的单个KS的先前方法(就F1度量而言)高达31.4%。我们的主要发现表明,新的类别图包含有关TC的有用信息,并获得了与以前使用的语义图相当的结果。此外,我们的结果还表明,使用增强的文本表示可以准确地预测主题分类器的准确性,优于考虑基于内容的相似性度量的先前方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号